Online Stochastic Matching: Beating l — \ 

Jon Feldman * Aranyak Mehta t Vahab Mirrokni * S. Muthukrishnan § 

May 26, 2009 

Abstract 

We study the online stochastic bipartite matching problem, in a form motivated by display ad 
allocation on the Internet. In the online, but adversarial case, the celebrated result of Karp, Vazirani 
and Vazirani gives an approximation ratio of 1 — - ~ 0.632, a very familiar bound that holds for many 
online problems; further, the bound is tight in this case. In the online, stochastic case when nodes are 
drawn repeatedly from a known distribution, the greedy algorithm matches this approximation ratio, 
but still, no algorithm is known that beats the 1 — - bound. 

Our main result is a 0.67-approximation online algorithm for stochastic bipartite matching, break- 
ing this 1 — - barrier. Furthermore, we show that no online algorithm can produce a 1 — e approximation 
for an arbitrarily small e for this problem. 

Our algorithms are based on computing an optimal offline solution to the expected instance, and 
using this solution as a guideline in the process of online allocation. We employ a novel application 
of the idea of the power of two choices from load balancing: we compute two disjoint solutions to the 
expected instance, and use both of them in the online algorithm in a prescribed preference order. To 
identify these two disjoint solutions, we solve a max flow problem in a boosted flow graph, and then 
carefully decompose this maximum flow to two edge-disjoint (near-)matchings. In addition to guiding 
the online decision making, these two offline solutions are used to characterize an upper bound for the 
optimum in any scenario. This is done by identifying a cut whose value we can bound under the arrival 
distribution. 

At the end, we discuss extensions of our results to more general bipartite allocations that are im- 
portant in a display ad application. 
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1 Introduction 

Bipartite matching problems are central in combinatorial optimization with many applications. Our mo- 



tivating application is the allocation of display advertisements on the Internet m and so we will use the 
language of this application to define and discuss the problem: 

(Online Bipartite Matching) There is a bipartite graph G(A, I, E) with advertisers A and impressions /, 
and a set E of edges between them. Advertisers in A are fixed and known. Impressions (or requests) in I 
(along with their incident edges) arrive online. Upon the arrival of an impression i G I, we must assign i 
to any advertiser a £ A where (i, a) G E(G). At all times, the set of assigned edges must form a matching 
(that is, no end points coincide). □ 

If the online algorithm knows nothing about I or E beforehand, and the impressions arrive in an 
arbitrary order, we have the adversarial model. Then, Karp, Vazirani and Vazirani lfl4l solved this problem 
by presenting an online algorithm with an approximation ratio of 1 — 1/e ~ 0.632, and further showed 
that no algorithm can achieve a better ratio. 

A different model is the online, stochastic one called the iid model, where impressions i G / arrive 
online according some known probability distribution (with repetition). In other words, in addition to G, 
we are given a probability distribution V over the elements of /. Our goal is then to compute a maximum 
matching on G = (A, I, E), where / is drawn from Z?H In this iid model, the greedy algorithm achieves 
an approximation ratio of 1 — 1/e lll2l [Il. Nothing better is known. 

Another stochastic model is the random order model where we assume that / is unknown, but impres- 
sions in / arrive in a random order. This has proved be an important analytical construct for other problems 
such as secretary-type problems where worst cases are inherently difficult. It is known that in this case 
even the greedy algorithm has a (tight) competitive ratio of 1 — ~ lfl2l . Further, no deterministic algo- 
rithm can achieve approximation ratio better than 0.75 and no randomized algorithm better than 0.83 [ 12]. 
Currently the best known approximation ratio remains 1 — 1/e. 

Can one beat the 1 — 1/e bound? We address this main question. 

1.1 Our Results and Techniques. We present two results for the online stochastic bipartite matching 
problem under the iid model. 

i-4 

• We present an algorithm with an approximation factor of — V — 0.67, breaking past the 1 — 1/e 

3 _ 3e 

bottleneck. We also show that our analysis is tight, by providing an example for which our algorithm 
achieves exactly this factor. 

• We show that there is no 1 — o(l)-approximation algorithm for this problem. Specifically, we show 
that any online algorithm will be off by at least 26/27 (or rs .99 if one requires a family of instances 
that grows with n). 

Our algorithms are based on computing an optimal offline solution, and using it to guide online al- 
location. An intuitive approach under this paradigm is to compute a matching Moff on the "expected 
graph" — that is, the one that would result if all impressions occurred exactly as many times as expected. 
Thereafter, one can use this matching online, that is, when node i G / arrives, match it with a G A iff 
(i, a) G Moff- One expects this to perform well if the empirical probability of occurrence of each node 
i G / is very close to its value in the distribution. This can be shown if all £ € J occur very frequently 

'For details of this application, see Section [T31 

2 We give more details on this model in Section[2] including a discussion of different ways to characterize an approximation 
ratio in this context. 
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using for example the Chernoff bound. However in general, many i £ J will have very low frequency. In 
this paper, we show that this first attempt achieves (you guessed it) 1 — 1/e, and this is tight. 

To get our main result and beat 1 — 1/e, we compute two disjoint offline solutions and use them as 
follows: when a request arrives, we try to assign it based on the first offline solution, and if that assignment 
fails, we try the second. In order to identify these two disjoint offline solutions, we solve a max flow 
problem in a boosted flow graph, and then carefully decompose this maximum flow to two edge-disjoint 
(near)-matchings. Other than guiding the online decision making, these offline solutions are used to char- 
acterize an upper bound for the optimum in each scenario. This bound is determined by identifying an 
appropriate cut in each scenario that is guided by a cut in the offline solution. This is the main technical 
part of the analysis, and we hope this technique proves useful for analyzing heuristic algorithms for other 
stochastic optimization problemsll 

The idea of using two solutions is inspired by the idea of power of two choices in online load bal- 
ancing SOI. Power of two choices has traditionally meant choosing between two random choices for 
online allocation; in contrast, we use two deterministic choices, carefully computed offline to guide online 
allocation0 

Our results are somewhat more general as shown in the technical sections, and the problem itself was 
motivated from an Internet ad application described later. 

1.2 Other Related Work. Our online stochastic matching problem is an example of online decision 
making problems studied in the Operations Research literature as stochastic approximate dynamic pro- 
gramming problems HJSl|8j[T0l. Several heuristic methods have been proposed for such problems (e.g., 
see Rollout algorithms for stochastic dynamic programming in [4]), but we are not aware of any rigorous 
analysis of the performance of the heuristics. Recently other online stochastic combinatorial optimization 
problems like Steiner tree and set cover problems have been studied in the iid model [13, 11]; one can 
achieve an approximation factor better than the best bound for the adversarial online variant. 

A related ad allocation problem is the Adwords assignment problem |[T6l that was motivated by spon- 
sored search auctions. When modeled as an online bipartite assignment problem, here, each edge has a 
weight, and there is a budget on each ad representing the upper bound on the total weight of edges that may 
be assigned to it. In the offline setting, this problem is NP-Hard, and several approximations have been 
designed 0H9H21. For the online setting, it is typically assumed that every weight is very small compared 
to the corresponding budget, in which case there exist 1 — 1/e factor online algorithms |[T6l l6l [T2l ITI. 
Recently, it has been brought to our attention that an online algorithm [9] gives a 1 — e-approximation, 

2 

for any e, for Adwords assignment when opt is larger than 0(\) times each bid in the iid and random 
permutation models. Thus, technically, our problem is different from their problem in two ways: the edges 
are unweighted (making it easier), but OPT is not necessarily much larger than each bid (making it harder 
- in the bipartite graph case, OPT can be O(n)). Moreover, our offline problem is solvable in polynomial 
time, and we show that no 1 — e-approximation can be achieved for our problem for some fixed e. In fact, 
their algorithm, along with other previously studied algorithms (e.g, algorithms based on greedy, greedy 
bid-scaling, and primal-dual techniques) does not achieve a factor better than 1 — - for our problem, and 
we beat 1 — - factor using a different technique. An interesting related model for combining stochastic- 
based and online solutions for the Adwords problem is considered in |fT51 , but their approach does not 
give an improved approximation algorithm for the iid model. 



3 For example, this technique might be applicable for proving performance guarantees for heuristics for approximate dynamic 
programming problems studied in the OR literature |5 4 8, 10 1. 

4 Previously, power of two choices has been used in various congestion control and load balancing settings. Our work is a 
novel adaptation of this idea to a stochastic bipartite matching setting. 
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1.3 Applied Motivation: Display Ad Allocation. Our motivation is in part applied and arises from 
allocation of "display ads" on the Internet. Here is a high level view. Websites have multiple pages (e.g., 
sports, real estate, etc), and several slots where they can display ads (say an image or video or a block of 
text). Each user who views one of these pages is shown ads, i.e., the ads get what is called an "impression." 
Advertisers pay the website per impression and buy them (typically in lots of one thousand) ahead of time, 
often specifying a subset of pages on which they would like their ad to appear, or a type of user they wish 
to target. All such sales are entered into an ad delivery system (ADS). 

Since the ADS serves ads on the same web pages from day to day, they have an idea of the traffic that 
occurs on these websites. While there are inaccuracies and indeed it is nearly impossible to forecast the 
number of viewers of a webpage in the future, it is standard industry practice to use these estimates at the 
time of selling inventory to various advertisers (to judge whether a new sale can be accommodated). 

When a user visits one of the pages, the ADS determines the set of eligible ads for that slot, and selects 
an ad to be shown. Since not all ads are suitable for each page or slot, we have an online (in two senses of 
the word) bipartite matching scenario. The ADS would like to maximize the number of impressions that 
are filled with ads in order to satisfy their contracts, and thus maximize their revenue. 

The underlying problem is an online bipartite matching problem in the iid model. Each i E / is an 
"impression type," which may represent a particular web page, or even a cross product of targeting criteria 
(location, demographic, etc.). Edges (a, i) then capture the fact that advertiser a was interested in an 
impression of type i. Using past traffic data, the ADS defines e,, to be the typical number of impressions 
they get of type i. Then, the distribution V over / is given by Pr[i*] = _ . 

In contrast to sponsored search, the display ad business is easier to model, since currently, display 
ads are not sold via auctions, and prices are the same for different impressions of an advertiser (so we do 
not need to worry about the underlying auction pricing schemes). Differing values of ad slots to different 
advertisers is handled exogenously via sales contracts, and the online problem is just to assign edges to 
meet the contracted sales. Still, we note that there are many aspects of online ad serving that deserve 
a richer model than the one we give here, and indeed there is more work to be done in this area. For 
example, the ADS may want to maximize the value of the contracts fulfilled, rather than the total number 
of impressions, or may want to maximize some notion of quality of ads served. One extension that we 
address is frequency capping, which we discuss in the conclusion. As such, display ad selection problems 
are solved routinely by ADSs, and any insights or solutions we develop for our problem are likely to be 
useful in practice. 

2 Preliminaries 

Consider the following online stochastic matching problem in the i.i.d model: We are given a bipartite 
graph G = (A,I,E) over advertisers A and impression types /. Let k = \A\ and m = |/|. We are 
also given, for each impression type i 6 I, an integer number a of impressions we expect to see. Let 
n = Yliei e i- We use V to denote the distribution over / defined by Pr[i] = e^/n. 

An instance T = (G,T>,n) of the online stochastic matching problem is as follows: We are given 
offline access to G and the distribution V. Online, n i.i.d. draws of impressions i ~ V arrive, and we 
must immediately assign ad impression % to some advertiser a where (a, i) E E, or not assign % at all. 
Each advertiser a £ A may only be assigned at most onceH. Our goal is to assign arriving impressions 
to advertisers and maximize the total number of assigned impressions. In the following, we will formally 
define the objective function of the algorithm. 

5 All results in this paper hold for a more general case that each advertiser a has a capacity c a and advertiser a can be assigned 
at most c a times. This more general case can be reduced easily to the degree one case by repeating each node a c a number of 
times in the instance. 
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Let D(i) be the set of draws of impression type i that arrive during the run of the algorithm. We let a 
scenario / = Ui e jD(i) be the set of impressions. Let G(I) be the "realization" graph, i.e., with node sets 
A and I, and edges E = {(a, i') : (a, i) e E,i' € D(i)}. 

Given an instance T = (G, V, n) of the online matching problem, we wish an algorithm ALG for 

which for any instance V of the online matching problem, with high probability ^p^jj > a. In this 
case, we say that the algorithm achieves approximation factor a with high probability. One could also 
study weaker notions of approximation, namely ^It.lft^} (the approximation factor in expectation), or 

E[OPT(I)\ 

^ opt(I) ] ^ e ex P ecte d approximation factor). Note that if one proves a high-probability factor of a, it 
implies an approximation factor in expectation, and an expected approximation factor of at least a — o(l). 

2.1 Balls in Bins. In this section we characterize two useful extensions of the standard balls-in-bins 
problem, where we are interested in the distribution of certain functions of the bins. We characterize the 
expectations of these functions, and use Azuma's inequality on appropriately defined Doob's Martingales 
to establish concentration results as needed. In particular, we will use the following facts. The proofs are 
left to the appendix. 

Fact 1. Suppose n balls are thrown into n bins, i.i.d. with uniform probability over the bins. Let B be a 
particular subset of the bins, and S be a random variable that equals the number of bins from B with at 
least one ball. With probability at least 1 — 2e _eri//2 , for any e > 0, we have \B\(1 — \) — en < S < 
\B\(1 - ± + -U + en 

l IV e en' 

Fact 2. Suppose n balls are thrown into n bins, i.i.d. with uniform probability over the bins. Let 
Bi, B2, ■ ■ ■ , Bi be ordered sequences of bins, each of size c, where no bin is in more than d such se- 
quences. Fix some arbitrary subset 1Z C {1, . . . , c}. We say that a bin sequence B a = (b±, ... , b c ) is 
"satisfied" if (i) at least one of its bins bi with i 1Z has at least one ball in it; or, (ii) at least one of 
its bins bi with i G TZ has at least two balls in it. Let S be a random variable that equals the number of 
satisfied bin sequences. With probability at least 1 — 2e~ e ' 2n ^ 2 , we have S > £(1 — ) — edn — 2 ^ c ° 2 _f 2 . 



3 Hardness 

In this section, we show that the expected approximation factor of every (randomized) online algorithm is 
bounded strictly away from 1 . 

Consider the 6-cycle G defined by A = {a, b, c}, I = {x, y, z}, and E = {(x, a), (y, a), (y, b), (z, b), 
(z,c), (x, c)}. The distribution V is the uniform distribution (1/3, 1/3, 1/3) on /, and n = 3. We show 
that no (randomized) algorithm can achieve an expected approximation factor better than 26/27 on this 
instance. Without loss of generality (from the symmetry of the 6-cycle), assume that the first impression 
to arrive is x and that it gets assigned to advertiser a. Now, if the next two arrivals are both of impression 
y, then any algorithm will only be able to assign one of these. The optimal assignment for the scenario 
(x, y, y) is to assign x to c, and the two y impressions to a and b. Since the probability of (x, y, y) is 1/9, 
the expected approximation factor is at most (l/9)(2/3) + (8/9)1 = 26/27. 

To get a family of instances on which no algorithm can do better than a constant bounded away from 
1, we will have to construct an instance consisting of a large number k copies of 6-cycles. Using this idea, 
we can prove the following theorem. The details of the proof are left to the appendix. 

Theorem 3. There is an instance of the online stochastic matching problem in which no algorithm can 
achieve an expected approximation factor better than ||. Moreover, there exists a family of instances with 
n — > 00 for which no algorithm can achieve an expected approximation ofl — o(l). 
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4 Offline Algorithms for Online Matching 

In this section, we present our improved online algorithms guided by offline solutions. Before stating 
the improved approximation result, we "warm up" with a simple, natural algorithm that uses the idea of 
computing an offline solution to "guide" our online choices. This algorithm will only achieve a 1 — |- 
approximation (which is tight). The proof of this part illustrates the framework we will use in the second 
section to beat 1 — - ; however we will need a new idea to achieve this — namely, the use of a second offline 
solution. 

4.1 "Suggested Matching" Algorithm: a 1 — ^-Approximation. The suggested matching algorithm 
is a first attempt at the approach of using an offline solution for online matching. In this algorithm, we 
simply find a maximum matching in the graph we "expect" to arrive, then restrict our online choices to 
this matching. 

Offline Algorithm. We will describe this algorithm more formally in terms of the standard characterization 
of ^-matching as a max-flow problem, since we will later use this flow graph explicitly to bound OPT. 
Given an instance T = (G(A, I, E),V, n) of the problem, we will find a max-flow in a graph Gf con- 
structed from G as follows: define a new source node s and an edge (s, a) with capacity 1 to all a G A, 
direct all edges in E from A to /, and add a sink node t with edges (i, t) from alii G / with capacity ej. 
Let f a i G {0, 1} be the flow on edge (a, i) in this max flow (since all the capacities are integers, we may 
assume that the resulting flow is integral [18]). For ease of notation, we say f a i = if edge (a, i) E. 

Online Algorithm. When an impression i' G D(i) arrives online, we choose a random ad a' according to 
the distribution defined by the flow; i.e., the probability of choosing a' is (Note that if ^ a f a i < 
there is some probability that no a' is chosen.) If a' is already taken, we do not match i to any adH 

Bounding ALG. The performance of this algorithm is easily characterized with high probability in terms 
of the computed max-flow. Define F a = f a i, and note that F a G {0, 1}; this indicates whether ad a 
was chosen in the max flow. Let A* = {a G A : F a = 1}. When an impression i G I arrives online, 
a particular ad a : f a ^ = 1 has probability 1 / ej of being chosen by the online algorithm; since each 
impression i has probability a jn of arriving, we conclude that each a G A* has probability 1/n of being 
chosen by the online algorithm upon each arrival. Thus, to bound the total number of ads chosen we have a 
balls-in-bins problem with n balls and n bins, and we are interested in lower-bounding the number of bins 
(among a subset of size \A*\) that have at least one ball. Applying concentration results for balls-in-bins 
(FactQ), we get that with probability 1 - e _n(n) , ALG > (1 - l)\A*\ - en. 

Bounding OPT. To bound the optimal solution, we will construct a cut in the realization graph G = 
(A, I, E) using a min-cut of Gf (constructed using the max-flow found by the algorithm) as a "guide." Let 
(S, T) be a min s — t cut in the graph Gf using the canonical "reachability" cut in Gf, i.e., S is defined as 
the set of nodes reachable from s using paths in the residual graph after sending the flow / found by the 
algorithm. This is always a min-cut. [ 18 ] Let A$ = A n S and define At, Is and It similarly. 

We claim that there are no edges in E from As to It', suppose there is such an edge (a, i). Then, a 
much be reachable from s since a G S, but i must not be reachable since i G T. This implies that there is 
no residual capacity along (a, i); i.e., / a j = 1. However this also implies that there is no residual capacity 
along (s, a) since (s, a) is the only edge entering a and it has capacity 1, and that there is no other flow 
leaving a. This implies that a is not reachable in the residual graph, a contradiction. Thus the only edges 
in the cut (S, T) are from s to At (capacity 1) and from i G Is to t (capacity ej). We may conclude using 
max-flow min-cut that \A*\ = J2 a F a = \A T \ + J2iei s e «- 

6 Clearly, making an arbitrary available match is always as good (and in some cases better) than doing nothing; we present the 
algorithm this way for ease of presentation. 
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Now consider the "realization" graph G = {A, I, E), and define a max-flow instance Gf whose solu- 
tion has size equal to the maximum matching in G; i.e., create a source s with edges to all a G A, direct 
edges of E toward /, and create a sink t with edges from all i' G /. Set the capacity of every edge to one. 
Note that any s — t cut in Gf is a bound on OPT. 

We define an s — t cut in (5, T) in Gf as follows. Let 1$ = Ui^i s D(i) and It = L>i & j T D(i). Define 
S = As U Is and T = At U It- Note that since there are no edges from As to It in Gf, there are also no 
edges from As to It in Gf. Thus the size of the cut (S, T) is equal to | Ig| + \At\. An online impression 
ends up in the set Is with probability J2iei s e i/ n > independent of the other impressions. Using a Chernoff 
bound, we can conclude that for any e > 0, with probability 1 — e~^( n ) (over the scenarios), the size of 
the cut (and therefore OPT) obeys OPT < \A T \ + J2iei s e ? + en = + en - 

Tightness of the Analysis. Consider a special case of the online matching problem T(G, V, n) where e« = 1 
for each i £ I and the underlying graph G is a complete bipartite graph. The algorithm will find a perfect 
matching between / and A, and so each ad is matched with probability at least 1 — -. Using Fact[TJ the 
algorithm achieves « (1 — with high probability. However, the optimum is n. Therefore: 

Theorem 4. The approximation factor of the suggested matching algorithm is 1 — ^ with high probability, 
and this is tight, even in expectation. 

4.2 "Two Suggested Matchings" (TSM) Algorithm: Beating 1 — -. To improve upon the suggested 
matching algorithm, we will instead use two disjoint (near-)matchings to guide our online algorithm. To 
find these matchings, we boost the capacities of the flow graph and then decompose the resulting solution 
into disjoint solutions. The second solution allows to to break the 1 — - barrier and prove: 

Theorem 5. For any e > 0, with probability at least 1 — e~ n ( n \ as long as OPT = O(n), the two 
suggested matchings algorithm achieves approximation ratio 

ALG 1-4 , 

OPT" 6 " a[= TZ± W °- 67029 > l ~ l e- 

3 3e 

Moreover, this ratio is tight; specifically, there is a family of instances for which the two suggested match- 
ings algorithm has expected approximation factor at most a + e. 

Throughout the section, until the final proof of Theorem |5J we assume = 1 for all i G /, which also 
implies m = n. Extending to integer e-i is a simple reduction to this case. 

4.2.1 The TSM Algorithm. In this algorithm, we construct a boosted flow graph Gf, built from G in 
the standard reduction of matching to max-flow; i.e., create a source s with edges to all a £ A, direct 
the edges of G towards nodes in /, and create a sink t with edges from all i G /. However, we set the 
capacities of the edges differently than in the max-flow reduction: (i) Edges (s, a) from the source get 
capacity 2, (ii) edges (a, i) G E get capacity 1, and (hi) edges (i, t) from I to t get capacity 2. 

We find a max-flow in this graph from s to t. Since all the capacities are integers, we may assume 
that the resulting flow is integral lfT8l . Let Ef be the set of edges (a, i) C E with non-zero flow on them, 
which must be unit flow. Since the capacities of edges (s, a) and (i, t) are all 2, we know that the graph 
induced by Ef is a collection of paths and cycles. Using this structure, we assign colors blue and red to 
the edges of Ef as follows: 

• Color the cycle edges alternating blue and red. 

• Color the edges of the odd-length paths alternating blue and red, with more blue than red. 
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• For the even-length paths that start and end with nodes a £ A, alternate blue and red. 

• For the even-length paths that start and end with impressions i G I, color the first two edges blue, 
and then alternate red, blue, red, blue, etc., ending in blue. 

Note that alH G I are incident to either no colored edges, one blue edge, or a blue and a red edge. 

The TSM algorithm for serving online ad impressions is simple: For each i € /, the first time i arrives 
try the blue edge; the second time i arrives try the red edge. More formally, for all i G / maintain a count 
Xi of the number of impressions i' G D(i) that have arrived so far. When i' G D(i) arrives: if Xj = 0, set 
a' to be the ad along i's blue edge (if i has a blue edge); if xi = 1, set a' to be the ad along f s red edge (if 
i has a red edge). Now assign i to a' if a' is unassigned. If this a' is already assigned, or if xi > 1, do not 
make an assignment^ 

4.2.2 Performance of the TSM Algorithm. To analyze the performance of this algorithm, we first 
derive a lower bound on the number of ads assigned during the run of the algorithm. We do so in terms of 
the incidence pattern of the different ads with respect to the edges Ef. Specifically, let Abr be the ads that 
are incident to a blue and a red edge, and Ab be the ads that are incident to only a blue edge. Similarly 
define Abb and Ar. We have 

\E f \ = 2A BR + 2ABB + A B + A R . (1) 

Consider some a G Ab with blue edge (a, i). The event that a is ever chosen is exactly the event that some 
%' G D{i) is ever drawn from D, since then we will choose a (and no other impression will choose a). Since 
e{ = 1, this is exactly the probability that a particular bin is non-empty in a balls-in-bins problem with 
n balls (the online impressions), and m = n bins (the impression types /). Applying FactQ] we get that 
with high probability the number of ads chosen from Ab is at least |^4b|(1 — -) — en. Now consider some 
a £ Abr with blue edge (a, %) and red edge (a,i r ). If \D(ii,)\ > 1, or if |D(i r )| > 2, then a will definitely 
be chosen. Thus we can apply Fact Q with n balls, m = n bins, c = 2, bin sequences equal to the 
neighborhood sets of Abr along the blue and red edges (ordered blue, red), d = 2 (since each impression is 
incident to at most 2 edges of Ef), and 1Z set to the second (red) bin of the bin sequence. We conclude that 
with high probability, the number of ads chosen from ^br is at least \Abr\{1 — — en. Similar reasoning 
gives bounds with coefficients of (1 — \) for Abb and (1 — -) for Ar. We may conclude that with high 
probability (over the scenarios), ALG > (1 - ^ ) \A BB | + (1 - Jr ) | ^4br | + (1 - \ ) \A B \ + (1 - f ) | A R \ - 4en. 
Note that since \Ab\ > \Ar\, we can also assert 

ALG > (1 - ^)A BB + (1 - Jt)^br + (1 - l e )(A B + ^4r) - 4en. (2) 

4.2.3 Bound on the optimal solution. Let (S,T) be a particular min s — t cut of the flow graph Gf 
defined as follows. First start with the canonical "reachability" min s — t cut of the flow graph Gf, where 
S is defined as the set of nodes reachable from s in the residual graph G / left after finding the max-flow 
Ef. Then, we do a small bit of "surgery" to this cut: for all i G / n T, if i is incident to more than one 
a G A n S, we move i over to S. Note that this does not increase the value of the cut, since we save at 
least 2 for the two edges from An S, and pay exactly 2 for the edge (i,t). Let As = A fl S, and define 
At, Is and It similarly. Let E$ be the set of edges (a, i) G E that cross the cut (from As to It). 

Some observations: (i) We have E$ C Ef, since otherwise, if some (a, i) G E$ has no flow across 
it, then i would be reachable from s, and would not be in the set T. (And we did not introduce any such 
edges in our surgery.) (ii) All i G It have at most one incident edge in Eg (follows from the surgery), (iii) 

7 A slight improvement to this algorithm is to try to match along the red edge if matching along the blue edge fails; we do not 
make use of this in the analysis so we leave it out for clarity. 
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All a G As have at most one incident edge in Eg. To see this, suppose it had two such edges (it cannot 
have more than 2 since Eg C Ef). Then, since a is reachable from s (since it is in S), it must have either 
residual capacity from s directly, or residual capacity from 1$; but it cannot have either, since (s,a) is 
saturated and both flow edges from a go to It- 

Let Ag, Is be the ads and impressions, respectively, that are incident to edges in Eg. We may conclude 
from the observations above that the graph (Ag, Ig, Eg) induced by Eg is a matching. The min-cut of Gf 
is made up of the edges Eg, the \At\ edges from s to At (with capacity 2), and the edges from Is to 
t (also capacity 2). Thus, by max-flow-min-cut, we have 

\E f \ = 2(\A T \ + \I s \) + \Eg\. (3) 

We are interested in bounding the value of the optimal matching in the realization graph G = (A, I, E). 
To do this, we will use the min-cut (S, T) of the graph Gf as a "guide" to construct a (not necessarily min) 
cut in a flow graph built from G, and prove a high-probability bound on the size of this cut. 

More precisely, we let Gf be a directed version of G, constructed as before with a source and a sink, 
and edges corresponding to G; but now we put capacity 1 on all edges. Note that any s—t cut in this graph 
constitutes an upper bound on OPT, the maximum matching in G. We construct such a cut (S, T) as 
follows. We let Is = Ui<=i s D(i) and It = Ui & i T D(i). For the ads, we will use almost the same partition 
(As, At) as in Gf but we will perform some "surgery" on this partition as well. Let Ag C As be the set 
of ads a G A that are incident (in Gf) to some i' G D(i) C I T . Note that i G Is and (a, i) € Eg. We set 
S = I s U (As \ A}) and T = It U At U A* s . 

Now we will measure the size of the cut (S, T) in Gf. We pay 1 for each a G Is, i G At and i G Ag. 
But note that there are no edges in Gf from A n 5 to Jy, since we got rid of them in our surgery. Thus we 
have OPT < \I S \ + \A T \ + \A* 5 \. 

Using a Chernoff bound, with probability 1 — e~ n ^ we have \Ig\ < \Is\ + en for any e > 0. To bound 
\Ag\, consider some a G Ag, and the impression i G Is along the edge (a, i) in the matching (Ag, Ig, Eg). 
The ad a appears in Ag iff impression i is drawn during the run of the algorithm. Thus we have a balls-in- 
bins problem with n balls, m = n bins, uniform bin probabilities and a bin subset of size \Ag\, and we are 
concerned with an upper bound on the number of bins in that subset that get at least one ball. Using FactQ] 
we may conclude that with high probability \Ag\ < (1 — | + en + O(l). 

Summarizing the previous arguments, we get, for any e > 0, with probability 1 — e~ n( - n \ OPT < 
\I S \ + \A T \ + (1 - ~)\E 5 \ + en. Applying Equations © then ©, we get 

OPT < \\E f \ + (i -\)\Eg\ + en 

= |A B r| + |^bb| + \(\A b \ + \A k \) + (\-\)\Eg\ + en (4) 

In order to use this bound on OPT together with the bound on ALG in Equation |2j we must bound the 
size of Eg in terms of the sets j4br> ^4bb> ^4b and Ar. The following lemma takes a deeper look at the 
two matchings constructed by the algorithm, and their relationship to the min-cut (S, T) in Gf, in order to 
achieve this bound. 

Lemma 1. 1^1 < §|A B r| + f |A BB | + |^b| + ||-4r|- 

Proof. It suffices to show that the inequality holds for every connected component (path or cycles) of the 
graph induced by Ef. We thus assume notationally that the graph induced by Ef consists of a single such 
connected component. 

Consider an arbitrary pair of edges (a\,i\), (02,^2) G Eg C Ef. Since the edges of Eg are indepen- 
dent, (01, i\) and (02, 12) cannot occur consecutively in this component (path or cycle); we claim further 
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that (ai,i\) and (0,2,12) must have at least two edges between them. Suppose not, then wlog (02, i\) is 
in the component; but since a2 G As and i\ € It (by the definition of Eg) we must have (02^1) € Eg, 
contradicting the fact that the edges Eg are independent. 

If the component is a cycle of length k, we can use the reasoning above to conclude that there are at 
most |_§ J edges of Eg in the cycle. The ads in the cycle are all in A BR and there are exactly | of them. 
Thus \Eg\ < ||^4br|> which implies the inequality. 

If the component is a path, we can conclude that \Eg\ < [~|] by the reasoning above — the worst case 
is when the path starts and ends in a Eg edge. We have three cases for this path, depending on the parity 
of its length, and (in the case of even-length paths) whether it starts and ends in A or /. 

• For odd paths of length k, by construction of the edge colors, we have one ad in A B and ads in 
A BR . Thus \E S \ < rf] = [Hl^al + |] < ||A BR | + 1 = §|A BR | + \A B \. 

• For even paths of length k that start and end with ads, we have | Ab | = 1, \A R \ = 1 and \Abr\ = 
I — 1. Thus l-E^I < [~] = |" 2 I^ R I + |] . We bound this using a case analysis on \A BR \ mod 3, as 
follows: (i) If \A B r\ = mod 3 then we get \Eg\ < §|A BR | + 1. (ii) If |A B r| = 1 mod 3 then 
we get \E S \ < ||A B r| + |. (iii) If \A B r\ = 2 mod 3 then we get \Eg\ < §|A B r| + |. In all cases 
this is less than §|A B r| + § = § | A B r| + \A B \ + l\A R \. 

• For even length paths that start and end in impressions, we have \A BB \ = 1 and \A BB \ = -| — 1. As 
in the previous case we can say \Eg\ < [|] = |" 2 1^r1 _|_ |"| ^ an( j reason by the same case analysis 
that \Eg \ is at most |^4br + f • This is equal to |Abr + |^4bb- □ 



4.2.4 Proof of Theorem HJ We first prove the approximation ratio for ej = 1. The bounds in equa- 
tions and (01) each hold with probability 1 — e ~ n ( n \ and so using a union bound they both hold with 
probability 1 — e - ^"). Using Lemma Q] (ignoring the | in front of the ^4^) and Equation (|4]), we get 

OPT < (~ - ^-)A BR + (| - ^-)A BB + (1 - i)(A B + A R ) + e'n. 

Since OPT = Q(n), we can choose e' small enough such that when we apply Equation (|2]) (also using e') 
we may conclude 

^ + e >minj^4,i^4,i^4) =min{.735...,.670...,.709...} = -^|«.670. 

^ 3 3e 3 3e 1 e ' 3 3e 

The tightness of this analysis is proved in Section 14.2.51 For arbitrary integer ej, we give a reduction 
to the case = 1. Given a set of instance T = (G, V, n), we reduce to a new instance F' = (G 1 , V , n) 
with e[ = 1 by making copies of each impression type i. Then, when an impression of type i arrives 
online, "name" it randomly according to one of its copies. The resulting distribution V is uniform over 
the impression types /' in the new instance. 

Let / be the impressions that are drawn from V in one run of the algorithm, and let i 7 be the resulting 
draws from V . By the arguments above, we achieve the desired bound on ALG/OPT' with high prob- 
ability, where OPT' is with respect to however we have OPT' = OPT, since the realization graphs 
G = (A, /', E') and G' = (A, I, E) are in fact the same graph. 

4.2.5 Tightness of the analysis for the TSM Algorithm. In this section we demonstrate a family of 

1— 2/e 2 

instances for which the TSM algorithm achieves a factor no better than 4/ 3 _ 2 /(3 e ) ■ tnus showing that the 
analysis in Section|4]is tight. 
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The family is parameterized by n, which is the number of advertisers, the number of impression types, 
as well as the number of impression arrivals. We shall take n to be a multiple of 4. The set A of advertisers 
consists of the following parts: a set K of size j and, for i G [1, j], advertisers {m, Vi, W{}. The set I of 
impressions consists of the following parts: a set L of size § and, for i € [1, §], impressions {xi,yi, z{\. 
Define = ie [l,n/4]}, and similarly, V,W,A',Y,Z. Thus vl = K U U U V U W and 

I = LU X UY (J Z. Draws are from the uniform distribution on I. 

The edges E are as follows: (i) For i <G [1, j], the 6-cycle {u; — xi — Vi — yi — W{ — Z\ — Ui}, (ii) a 
complete bipartite graph between K and X, and (iii) a complete bipartite graph between L and W. 

We now describe the max-flow and min-cut in Gf found during the algorithm. The only edges with 
(unit) flow are the edges of the 6-cycles, i.e., for i € [1, §], {u{ — xi — Vi — yi — Wi — Zi — u{\. Thus all 
vertices in U, V, W, X, Y and Z have a flow of 2 each, and the vertices in K and L have a flow of 0. The 
reachability cut (S,T) obtained from this flow has S = KUXUUUVU {s} (where s is the source 
vertex). The flow and the cut both have size Using FactQ] one can easily check that the algorithm 
achieves the total matching size of ^(1 — %) with high probability. 

The following assignment can be made with high probability, and is a lower bound on OPT. (i) With 
high probability there will be j draws of impressions from X (with repeats). These can be matched to the 
j advertisers in K (in any order), (ii) With high probability there will be § draws of impressions from 
L. These can be matched to the j advertisers in W. (iii) With high probability there will be (1 — -)§ 
unique draws of impressions from Y (counting each yi only once, even if it is drawn multiple times). For 
every such yi, its first draw is matched to vi, and the repeat draws of yi are left unmatched. Similarly, with 
high probability, there are (1 — unique draws of Zi's, and these are matched to the corresponding m's. 
Thus, this assignment has size § + ? + (1 — ~)8 = n(l — This means that the TSM algorithm cannot 
achieve a factor better than (1 — tt)/(| — 

5 Concluding Remarks 

Applying the insights to the display ads application. The approach of using the offline solution to 
allocate ads online may be quite useful in practice because while one can invest some time offline to find 
the guiding solutions, the online allocation has to be done very quickly in this application. One can use this 
approach to model other objective functions such as fairness in quality of ad slots assigned to ads, which 
may be solvable offline with some computational effort. As an example, we elaborate on the extension of 
our algorithm to the following problem. In the display ads business, advertisers have "frequency caps;" 
i.e., they do not want the same user to see their ad more than some fixed (constant) number of times. We 
can extend our approach here to get a 1 — 1/e approximation as shown in the appendix. 

Generalizing the algorithm. One can generalize the two-matching algorithm to a "/c-matching" algorithm 
by computing k matchings instead of 2 matchings, and then using them online in a prescribed order. We 
can easily show that if the underlying expected graph G admits k edge-disjoint perfect matchings, the 
approximation factor of such an algorithm is 1 — \ ~ 0.72 and 1 — \ ~ 0.75 for k = 2 and k = 3 
respectively, however for k = 3, we do not know how to generalize our result for to graphs. One natural 
question left open by this work is what constant c(k) is achieved by extending to k matchings, where 
•67 < c(k) < .99. 

Fractional version. A theoretical version of online stochastic matching problem that may be of interest 
is the case in which e^'s are not necessarily integers, but arbitrary rational numbers. We observe the 
analysis of the "one suggested matching" algorithm can be generalized to this case, but do not know how 
to generalize the analysis of the "two suggested matchings" algorithm. The details are in the appendix. 
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A Balls in Bins 

In this part, we prove the concentration facts we used throughout the paper. 

FactJJ Suppose n balls are thrown into n bins, Ltd. with uniform probability over the bins. Let B be a 
particular subset of the bins, and S be a random variable that equals the number of bins from B with at 
least one ball. With probability at least 1 — 2e _en / 2 , for any e > 0, we have 

151(1 - -) - en < S < 151(1 -- + —) + en 

e e en 



Proof of FactU} We have E[S] = 2~2 a eB 1 ~ i 1 ~ IT and 

so using standard identities, we obtain 

£ 1 - e- 1 < E[S] <£l- e -i(l-I). 

aeB a<=B 

Since S, as a function of the placements of the n balls, satisfies the Lipschitz condition, we may apply 
Azuma's inequality to the Doob Martingale and obtain 

Pt[\S-E[S\ \ > en] < 2e~ e2n/2 . 

□ 



Fact |2j Suppose n balls are thrown into n bins, i.i.d. with uniform probability over the bins. Let 
B\ , B2 , . . . , Bp be ordered sequences of bins, each of size c, where no bin is in more than d such se- 
quences. Fix some arbitrary subset 1Z C {1, . . . , c}. We say that a bin sequence B a = (pi, ... , b c ) is 
"satisfied" if 

• at least one of its bins hi with i $l1Z has at least one ball in it; or, 

• at least one of its bins b{ with i € 1Z has at least two balls in it. 

Let S be a random variable that equals the number of satisfied bin sequences. With probability at least 
1 — 2e~ e n l 2 , we have 

S >£(!-—)- edn ^ 

e c e c n — c z 
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Proof of Faciei First, we claim 



2 |W| c 2 



£[S]>* 1-— 1 + 



e c \ n — c 2 

To see this, fix some bin sequence B a . The probability that B a is not satisfied is 

The bound on E[S] follows by linearity of expectation. Now, consider S as a function of the placements 
of the n balls. Moving one ball can affect S by at most d, since each bin is in at most d sequences. Thus 
we may apply Azuma's inequality and obtain, for all e > 0, 

Pr[\S-E[S]\ > edn] < 2e~' 2n ' 2 . 

□ 



B Details of the proof for Hardness Result 

Consider the instance which consists of a large number k copies of 6-cycles, the uniform distribution on the 
union of the impressions, and n = 3k. Let 71, 72, 73 and 7+ be the fraction of the cycles that receive 1, 2, 3 
and more than 3 impressions, respectively. We have (using a simple application of Azuma's inequality) 
that with high probability, 

72 \2jk 2( 3k' ~2e 3 ' 
/3AA 1 _3_ 3fc _ 3 _27 

73 ~ \3 ) P ( 3k ] ~ 6e 3 ' 
7+ = 1 - 7i - 72 - 73 

For cycles that receive 1 or 2 impressions, we can assume that both ALG and OPT match 1 or 2 ads, 
respectively. As we are upper-bounding E'fALG/OPT], we may assume that on cycles that receive more 
than 3 impressions, both ALG and OPT achieve 3 matches, which maximizes the contribution of these 
cycles to the ratio ALG/OPT. 

For cycles that receive exactly 3 impressions, we have the same situation as in the single cycle above. 
We assume wlog that x arrives first and is matched to ad a. If the other two impressions are also both x, 
then both ALG and OPT match two ads (a and c) for this cycle. If the other two impressions are both y, 
we have that ALG matches at most two ads but OPT matches three. In all other scenarios, we assume that 
both ALG and OPT match three ads. By a Chernoff bound, with high probability the scenarios (x, x, x) 
and (2, y, y) each happen ~ 73/0/9 times. 

Summarizing, we have argued that with high probability, 

ALG ^ 71 + 272 + 3 7+ + (2 ■ § + 3 ■ |) 73 _ 6e 3 - 23 ^ ^ 
OPT ~ 71 + 2 72 + 3 7+ + (2 • i + 3 • |) 7 3 ~ 6e 3 - 22 ~ ' 

This establishes Theorem [3] 
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C Non-integral Impression Arrival Rates 

One natural extension of the online stochastic matching problem is the case in which e^'s are not neces- 
sarily integers, but arbitrary rational numbers. We observe that the "Suggested Matching" algorithm, with 
1 — ^-approximation factor, easily generalizes to this case, as follows: instead of computing a maximum 
matching, we can compute a maximum flow, /, on the corresponding flow graph, and upon the arrival of 
an impression i, assign the impression i to an ad a with probability i.e., proportional to the fractional 
edge from i to a. Given the total fraction F a on each ad a, we can argue that this algorithm achieves 
value Yla£A(l ~ e ~ Fa ) with high probability. Moreover, one can show that optimum is at most EaeA p a 
with high probability. As a result, the approximation factor of the algorithm can be captured by the ratio 

V* e~ Fa ) 

| ^ — p — - where < F a < 1 for all a € A. Since < F a < 1, we can characterize this bound as the 
solution to the following mathematical program: 



min EaG^( 1 ~ e 
S-t- EaGA F a = 1 

< F a < 1 Va G A 

This mathematical program can be solved analytically. Consider the vector of values Fx,... , F\^\ 
in nonincreasing order of F's, and let /($) = EaeA(l ~~ e~ Fa ). For any vectors <£>i and $2 sub- 
ject to H'&lli = 1, if $1 majorizes $2, then clearly /(^i) > f(&2)- Since the uniform vector «3> = 
[1/|A|, . . . , 1/|A|] is majorized by all the vectors, f(&) = |A|(1 — e^ 1 ^^) is the minimum value attain- 
able. When \A\ = 1, /(§) = 1 - 1/e. We derive that §ff = 1 - e -1 /l-A| + \A\{-e- x l\ A \ x 1/|A| 2 ) = 
1 - e -1 /l^l - e ~y\ A \/\A\. Now, 1 - e~ l l\ A \ - e~ l /\ A \/\A\ > because multiplying by eVI^I, we get 
e l /\ A \ > 1 + 1/\A\ which follows from the Maclaurin series expansion of e x . Thus, df(&)/d\A\ > and 
this implies that the solution to the mathematical program is attained at 1 — -. Therefore, the approximation 
factor of the algorithm is equal to 1 — | with high probability. 

Generalizing the TSM algorithm to non-integer e^s needs a proper decomposition of the flow on the 
corresponding flow graph to two edge-disjoint flows each with large values. Unlike the integral case, such 
edge-disjoint decomposition is not possible for the non-integer e^'s and one need to exploit other ideas to 
analyze the algorithm. We leave this as an open question. 

D Frequency Capping 

A useful generalization of the online matching problem that is well-motivated by the ad allocation appli- 
cation is when the advertisers have "frequency caps;" i.e., they do not want the same user to see their ad 
more than some fixed (constant) number of times. We can regard the user as a "feature" of the impression; 
i.e., that an "impression" i as we've used it in this paper is in fact a pair (i, u), where u is a particular user, 
and we have a distribution that gives us en u \, the expected number of impressions of each type from each 
user. Also as part of the input, we are given, for each advertiser a, a total number of impressions d a and 
a cap c per user. We could also regard these caps as operating as impression limits on other features, e.g., 
demographic or geographic. 

Our 1 — ^-approximation algorithm (the suggested matching algorithm) from Section 14.11 is easily 
extended to this generalization of the problem. Here we give a sketch of this extension. For the algorithm, 
we simply make another layer U of nodes in our max-flow computation, with one node (a, u) for each 
(advertiser, user) pair. We make edges from each a G A to this layer with capacity c, and set the capacity 
of the edge edge (s, a) to d a . The algorithm proceeds as before, and one can easily show with the same 
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argument that the number of impressions matched is ~ F(l — 1/e), where F is the value of the flow. Then, 
by reasoning about the min-cut in this graph, with some simple reasoning about where this new layer sits 
in the min-cut, one can still show that OPT is bounded by F with high probability, giving the desired 
approximation ratio. 

Interestingly, it is more challenging to generalize the TSM algorithm. Setting the capacities to 2d a 
and 2, respectively, of the top and mid-layer edges does not work as desired, since then the flow could be 
spread among more than d a nodes in the middle layer. 
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